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Abstract 

Three generations of full-custom analog integrated circuits designed for low-power, high-speed sampling of Radio- 
Frequency (RF) transients in excess of the Nyquist minimum have been developed. These 0.25/im CMOS devices 
are denoted the Large Analog Bandwidth Recorder and Digitizer with Ordered Readout (LABRADOR) ASICs and 
finally consist of 9 channels of 260 deep sampling. Continuous sampling is provided with common stop capability. 
Input analog bandwidth is approximately 1GHz and sampling speeds are adjustable from 0.02 to 3.7GSa/s. 
Completely parallel internal conversion supports 12-bit digitization and readout of all 2340 cells in under 50/is. 



1. Introduction 

Observation of the early universe through neu- 
trino messengers of the highest possible energies 
requires a detector of enormous instrumented vol- 
ume. One promising means to observe such a large, 
radio-transparent target is viewing the Antarc- 
tic ice shelf via high altitude balloon [1]. Such a 
balloon-borne detector needs hundreds of high- 
speed sampling channels (multi-event buffering), 
operating over a frequency band from 200-1200 
MHz [2]. Since all power must come from solar 
panels, and heat dissipation is a major problem, 
commercial flash ADCs were precluded. 



For at least two decades a number of Switched 
Capacitor Array (SCA) devices have been re- 
ported in the high energy physics literature, for 
example [3,4,5], and many with sampling speeds 
high enough for greater than Nyquist sampling 
of a GHz analog bandwidth signal. These GSa/s 
devices have been used for low and high energy 
neutrino detection [6], particle physics [7,8] and 
gamma-ray astronomy [9]. However, despite such 
high sampling speeds, all of these devices have 
analog bandwidth cutoffs which limit their use at 
UHF frequencies and above. 

We present here the results of three generations 
of a high analog bandwidth ASIC designed to meet 
these instrumentation needs. 
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2. Architecture 

A number of different CMOS SCA architectures 
have been discussed in the literature. An excellent 
summary of the storage circuit details and perfor- 
mance may be found in Ref. [10]. As will be seen 
below, in order to couple in high analog bandwidth 
it is necessary to limit the parasitic and storage 
capacitance of the SCA array. Thus a compact, 
minimal storage array was considered and initial 
prototyping looked promising [11]. This choice of 
a compact storage matrix was guided and syner- 
gistic with very similar storage architectures be- 
ing explored for Monolithic Active Pixel Sensors 
(MAPS) [12] for charged particle tracking. 

2.1. Theory of Operation 

Employment of SCA techniques in CMOS pro- 
cesses have been effective in the areas of basic sig- 
nal processing, continuous filter design, and pro- 
grammable capacitor arrays, used for Digital-to- 
Analog (DAC) and Analog-to-Digital (ADC) con- 
version. As elements of a basic programmable fil- 
ter, a simple inline capacitor between two switches 
may be used to form a frequency-controlled resis- 
tor, with resistance R given by [13]: 



R = 



1 



(1) 



for a given capacitor C, being switched at fre- 
quency f c [Hz]. Almost arbitrarily complex niters, 
composed of these variable R and C configura- 
tions, can be formed and expressed in terms of 
poles and zeros in a transfer function, the mathe- 
matics of which is conveniently described via the 
^-Transform [14], a staple of modern signal pro- 
cessing. As an example, from these simple build- 
ing blocks, first order filters can be constructed as 
represented by the transfer function: 



H(z) = K 



1 + aiz' 1 



(2) 



where z" 1 = e' wT , T = l// c , -1 < ai < 1, and 
K is an overall normalization constant. Through 
choice of constants, one can form a Low-pass filter 
(6o-6i=0): 



H(z) = K 



1 



1 + a\z 



or a High-pass filter (b = —bi): 



H{z) = K 



1 



1 + CL\Z 



(3) 



(4) 



Since first-order filters only have one real pole, they 
cannot directly realize band-pass or notch filters. 
More flexible and universal are filters of second- 
order and beyond. Second order SC filters arc often 
called biquad circuits and have may be expressed 
as 



H(z) = K 



b + biz 1 + b 2 z 
1 + a\z~ x + a,2Z~ 



(5) 



and is analogous to the continous time case where 
the transfer function may be represented by 



H{s) = K { 



s 2 + c?is + do 
°^2~, ; — 



(6) 



where as long as the sampling frequency is much 
higher than the signals of interest, the approxima- 
tion z^ 1 ~ 1 — iuT may be used. And from this 
point, standard pole-zero analysis can be used. 

Beyond simple synthesis of rather complex fil- 
ters using standard tools, the true power of this 
technique lies in pairing such SCA processing with 
operational amplifiers on an integrated circuit to 
achieve powerful sampling and signal manipulation 
capabilities. For instance, in analogy with an R-2R 
ladder topology, a multiplying DAC may be ex- 
pressed using an array of switches and capacitors 
with the simple transfer function [15] 



H(z) 



(7) 



and ignoring the half-period delay indicated by the 
z"2, can synethize an output voltage v out based 
upon a reference voltage v re f via the expression 



Vout 



= V ref ' 



(8) 



i=l 



with the bi being the binary-coded digital signal, 
precisely as expected for a DAC. ADC topologies 
are now myriad and the focus of this paper is on 
a specific type - the transient waveform recorder. 



2 



In some ways this makes use of the simplest SCA 
structure of them all, the Sample- and- Hold (S/H) 
circuit. The great power of the papers referenced 
above derives from the ever increasing speed and 
compactness of deep submicron CMOS processes. 

While an idealized waveform recorder is simply 
an array of S/H circuits, parasitic capacitances re- 
quire consideration of parasitic circuits like those 
referenced above. For the specific application at 
hand, parasitic inductances and capacitances are 
critical to storing analog waveforms with frequency 
content in the Giga-Hertz range. 

2.2. Bandwidth Limitations 

In order for an SCA storage device to be useful, 
it must have a decent number of storage cells. Load 
capacitance increases as a function of the number 
of switches connected to the incoming signal line, 
as well as the resistance-shielded storage capaci- 
tances when the switches are closed. For a properly 
coupled 50f2 stripline into an ASIC, for a purely 
capacitive storage array, the 3dB roll-off is given as 

Therefore, to obtain a 3dB bandwidth of 1.2GHz, 
the pure capacitance must be limited to approxi- 
mately 2.65pF. This value is already smaller than 
that of the high-ESD protection diodes (~ lOpF) 
provided in a standard design library used and 
therefore the input protection must be modified. 
A more accurate assessment of the input coupling 
performance requires a refined description of the 
input circuit model and will be discussed in much 
more architecture-specific detail below. In sum- 
mary, to realize 1GHz of analog bandwidth with 
good coupling, will use the following design prin- 
ciples: 

(i) 50f7 stripline everywhere 

(ii) minimize input protection capacitance 

(iii) minimize switch drain, storage capacitance 

Based upon these considerations, efforts have 
been made to maintain a 50f2 coupling across the 
sampling array inside the ASIC. As a trade-off be- 
tween storage depth and parasitic drain capaci- 



tance, 256 samples per input was chosen. Finally, 
the size of the storage capacitance was studied. 

2.3. Storage Limitations 

Limits are imposed on the minimum size possible 
for the storage capacitor. Since for a S/H circuit 
there is no means to perform a Correlated Double 
Sampling, an ambiguity in the actual stored value 
is given in terms of electron counting statistics by 
the usual expression 




where k is Boltzmann's constant and temperure T 
is in Kelvin. Matching the 12-bits of dynamic range 
of the sample storage conversion to a Wilkinson 
ramp voltage range of about IV, aslope of approxi- 
mately 0.25mV/least count is realized. At this level 
of sensitivity, the impact of the choice of storage 
capacitance value is seen in Fig. 1. At the upper 
left of this figure is a schematic representation of 
the basic storage cell for the first two generations of 
ASIC that utilized a transimpedance storage archi- 
tecture. A reference value of 78 JF is shown - about 
matching the least count of the ADC as shown. 



Impact of Storage Cap size 




Storage Cap [fF] 



Fig. 1. Noise limited sampling resolution as a function of 
storage capacitance value. 

In the last generation a different readout is em- 
ployed, although the same basic NMOS transistor 
gate capacitance storage is used. This constraint 
on minimum size is subsequently considered in the 
choice of storage capacitance. However reducing 
the storage capacitance too much makes switch 
charge injection and leakage current effects more 
prominent. 
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2.4. Architectural Details 

Three generations of LABRADOR architecture 
ASIC have been designed, fabricated and tested. 
Their key features are summarized in Table 1. All 
have been fabricated in the TSMC 0.25/im CMOS 
(LO) process and have been packaged in a 100- 
pin plastic TQFP package. Economics and package 
performance simulations [11] drove this decision. 
BGA packages were considered and may be used 
in the future to reduce the contribution due to lead 
inductance, however all test results are shown for 
this same 16.6 x 16.6 mm plastic package. 

Table 1 

Summary of three LABRADOR generations, where for 
brevity they will be referred to by a shortened designation; 
e.g. LABRADORl = LABI. 



SCA bank: 4 rows x 256 columns 

Gain AMUX 



Item 


LABI 


LAB2 


LAB3 


# of RF inputs 


8 


8 


9 


Samples/input 


256 


256 


260 


Total samples 


2048 


2048 


2340 


# of ADCs 


128 


128 


2340 


ADC Conversion cycles 


16 


16 


1 


Readout latency [fis] 


2200 


2200 


< 50 


Analog MUX out [/js] 


25.6 


25.6 


N/A 


DC GND ref. 


no 


yes 


no 


Analog out 


yes 


yes 


no 


50Q term. 


end 


end 


input 



In contrast to the first two generations of 
LABRADOR ASIC, the third generation was a 
purely digital output device, changed input termi- 
nation scheme to be at the input, and went to a 
massive array of Wilkinson ADCs (one per pixel). 
These differences and lessons learned will be high- 
lighted below. The architecture of first two ASICs 
is illustrated schematically in Fig. 2. Examining 
Table 1, the primary difference between LABI 
and LAB2 was the attempt to provide a means to 
internally bias the RF inputs. This circuit did not 
work well due to high resistance noise coupling. 
LABI results are similar, though better in all 
cases. Eight RF input channels are each sampled 
by an array of 256 SCA storage cells. Sampling oc- 
curs continually until a trigger signal is generated. 
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SCA bank: 4 rows x 256 columns 



Fig. 2. Block diagram of the LAB1/LAB2 architecture. 
Samples are stored for the 8 RF inputs in an array of 256 
storage cells. Writing is controlled by a write pointer that 
continuously cycles across the array and stored values are 
held upon receipt of trigger signal. Stored values are then 
addressed (gain adjusted) and either stored for conversion 
in an array of 128 Wilkinson ADCs or multiplexed off-chip 
for external conversion. 

At this point the analog samples are held and not 
overwritten. These stored values are then selected 
and a transimpedance relay of the stored charge 
is made, which is either stored into input samples 
of an array of 128 channels of Wilkinson ADC or 
analog multiplexed and transferred off-chip for ex- 
ternal ADC conversion. A die photograph of the 
approximately 10mm 2 LABI device is shown in 
Fig. 3. 
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Fig. 3. A die photograph of the LABRADORl ASIC. 
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LABRADOR 
M4 



M5 



Z~i3n 



LABRADOR Resistance Estimate 


Input (RF) Input (ref) 














bond wire 




Length 


17000 


X 






0.0 


0.1 


pad 










70 






0.2 


M5-M4 




Metal 4(sheet) = 


0.07 


Ohm/sq 




71 .42857 




5.0 


typ length (sq.) 


Metal 5(sheet) = 


0.03 


Ohm/sq 


166.6667 




5.0 




typ length 


[sgj 


Poly contact = 


5.1 


Ohm 


6 


6 


0.9 


0.9 






via 1 = 


2.7 


Ohm 


6 


3 


0.5 


0.9 






via 2= 


5.35 


Ohm 


6 


3 


0.9 


1.8 






via 3= 


8.26 


Ohm 


6 


3 


1.4 


2.8 






via 4= 


11.34 


Ohm 


6 




1.9 


















10.5 


11.5 


Total per feed 
































28 


Rterminator 






















Measured: Ohm 50.0 


Grand Total 



Fig. 4. Schematic representation and resistance breakdown of the LABI signal chain. Effects due to both resistive drop 
across the sampling array, as well as low impedance of the on-chip stripline, were observed in testing. 



Layout of the LABI ASIC quite directly follows 
the arrangement of the functional blocks in the 
schematic diagram. While efforts were made to op- 
timize the coupling of the input signal based on ear- 
lier efforts with the STRAW [11] architecture, the 
choice of LABI input structure represented a com- 
promise, as shown in Fig. 4. Signals are straight in- 
put shots on the left and terminated in a 28S1 resis- 
tor at the right. This choice is a trade-off between 
widening the signal trace, which would lower the 
microstrip impedance even below the Zq — 130 
shown, or having even larger resistive losses across 
the array. These resistive losses made for a vex- 
ing amplitude-dependence across the array. To ad- 
dress this issue in LAB3, a 50ft termination resis- 
tor is placed directly at the input to the detector. 
The termination resistor was removed from the ar- 
ray end. Since offset biasing could be performed 
directly at this input termination, the resistance 
of the signal line was unimportant and the on-chip 
stripline could be made exactly Z = 50f2. Any 
reflection at the end of the array would be back- 
terminated, though this stub is short. At maximum 
signal frequency of 1.2GHz, for a stripline of 2mm 



long (about lOps at v prop ~ 
duced by this stub is about 



\c), the phase intro- 



lOps 



(1.2GHz)- 



(360° 



.6° (11) 



which is acceptable, though for operation at higher 
frequencies, such effects may be non- negligible. In 
all cases the input protection diodes have been 
completely removed. Current discharge is provided 
through a 20/cil pull-down resistor to ground and 
voltage clamping is provided by external back-to- 
back RF diodes. 

Other lessons gleaned from the first two 
LABRADOR generations included observing that 
while having analog samples available for exter- 
nal digitization has merits, non-linearities in the 
transimpedance response and temperature depen- 
dence were major issues. As space was available 
to permit completely parallel conversion of all 9 
channels by 260 samples, in-situ conversion was 
adopted, as illustrated in Fig. 5. Including four 
extra "tail" samples avoids a sampling record gap 
during the interval in which the write pointer is re- 
turning to the beginning of the sampling window. 
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LABRADOR(3) architecture 




SCA bank: 5 rows x 260 columns 



Fig. 5. Block diagram of the LAB3 architecture. In contrast to the LAB1/LAB2, the stored analog signal is never transferred. 
Instead, direct Wilkinson conversion is done within each storage cell. 



Details of the required timing and offset cal- 
ibrations are discussed below. In order to acco- 
modate the additional samples, as well as provide 
space for a Wilkinson comparator and 12-bit latch 
in each pixel storage cell, the die had to increase 
slightly to approximately 3.2 by 2.8 mm. Metal 
fill rules required covering the interesting parts of 
the die, making LAB3 far less photogenic than 
LAB1/LAB2 and thus not included. Addition of a 
9th channel was done to allow insertion of a com- 
mon reference clock into the data stream for each 
LABRADOR. This was found to be use for improv- 
ing the temporal alignment of waveforms recorded 
by different chips. 

All three generations use the same write pointer 
structure. This is a classical voltage-controlled in- 
verter chain, with an odd number of stages such 
that a ripple continuously propages. An XOR cir- 
cuit and a look-ahead signal are used to open each 
storage gate for the time it takes to transition from 
the look ahead to current locations (4-6 samples). 

Despite best efforts at balancing the threshold 
voltage and NMOS versus PMOS L:W ratios, some 
amount of propagation variation is expected when 
the ripple edge across the array is transitioning 
low-to-high versus high-to-low, as shown below. 



The ramping voltage for Wilkinson conversion 
is generated by using a current source and either 
an internal or external reference capacitor. In all 
testing shown below, an external 200pF capacitor 
is used. An external (68fcf2) bias resistor sets the 
drive strength of the current source to approx- 
imately 20/iA. A common Gray-code counter is 
provided on chip and broadcast to all SCA cells. 
When the ramp threshold is crossed in a particu- 
lar cell, the current count value is latched. Upon 
completion of ramping, all 2340 12-bit values are 
available for random- access readout. 



2.5. Design Evolution 

In summary, the biggest changes in going from 
the LAB1/LAB2 architecture to the LAB3 are 

(i) direct termination at array input 

(ii) Wilkinson conversion in each storage cell (no 
analog signal transfer) 

(iii) addition of a 9th (clock reference) channel 

and by these choices good performance results 
have been obtained, as documented below. 
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3. Test Results 



3.2. Input Coupling 



A variety of tests have been carried out to eval- 
uate the performance of the LABRADOR series of 
waveform recorders. These measurements attempt 
to verify the degree to which the performance tar- 
gets have been met, as well as characterize the sys- 
tem in preparation for UHF radio transient detec- 
tion. Because of the superior performance of mea- 
sured noise, bandwidth, linearity and digitization, 
results are shown the LAB3 ASIC. 

3.1. Sampling Speed 

The sampling speed dependence on an ad- 
justable control voltage (ROVDD) is plotted in 
Fig. 6. Stable sampling speeds ranged from 0.02 
to almost 4 GSa/s (limited by operation beyond 
the 2.5V nominal VDD rail voltage). 
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Fig. 6. Sampling rate as a function of control voltage. Both 
data and SPICE simulation are plotted, where a difference 
is observed between rising or falling edges of the ripple 
oscillator as described in the text. 

The SPICE simulation was fairly conservative 
and should be considered a lower- limit, pessimized 
for a worst-case spread in actual CMOS fabrica- 
tion parameter values. While the sampling rate is 
defined as the cycle average of the so-called Rip- 
ple Carry Out (RCO), which is a copy of the write 
pointer monitored external to LAB3, the propaga- 
tion speed of the high-to-low and low-to-high are 
seen to be different. At a nominal 2.6GSa/s this 
corresponds to about a 2% effect and is readily cal- 
ibrated out by latching the RCO bit state at the 
time a trigger is recorded, as will be discussed later. 



Pulsing the input to the LAB3 chip with a fast 
risetime signal, a reflection R = +6.8% is observed. 
Solving the usual expression 



Z-Z a 

z + z 



R 



(12) 



an impedance value of Z = 57f2 is determined. 
This is consistent with the measured 5957 DC re- 
sistance of the fabricated device, which appears to 
be about 20% higher than specified, though within 
spreads observed for silicide block in recent similar 
runs. 

Because the signal of interest is an RF signal, 
a standard DC linearity scan performance is less 
important than evaluation with a realistic impul- 
sive signal. Therefore, to determine the input cou- 
pling, linearity and cross-talk performance, an RF 
impulse was used as shown in Fig. 7. Most of the 
signal power of interest is in the steep high-to-low 
transition. 



Cross-talk and Gain Reference pulse 
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Fig. 7. Time-domain signal of the RF pulse used to evaluate 
input coupling, linearity and crosstalk, as recorded with a 
3GHz bandwidth oscilloscope. 

A 3GHz analog bandwidth oscillocope was used 
to record this reference signal. However the sig- 
nal from the pulse generator itself was not flat 
in the frequency domain. Moreover this reference 
pulse has been bandwidth limited between 200- 
1200MHz, to match the frequency range of the 
ANITA instrument signal chain, in which these 
measurements have been performed. 
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Because determination of the analog bandwidth 
of the LAB3 device requires removing the intrin- 
sic frequency of the RF pulse itself, its FFT has 
been measured and is displayed as the blue curve 
in Fig. 8. In red in this upper plot is the recorded 
LAB3 response. 



Bandwidth Determination 
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Fig. 8. Determination of the LAB3 ASIC analog band- 
width in a test board configuration with a four-way split of 
the RF signal. In top (blue) RF reference pulse and (red) 
LAB3 FFT. At bottom is the difference, where for perfect 
coupling a -6dB loss would be expected. 



band definition filters (e.g. 1200MHz low pass) 
indicated somewhat better higher frequency re- 
sponse and some of this loss may be due to com- 
ponents on the ANITA flight digitizer (SURF [2]) 
board used for evaluation. Therefore this curve 
may be considered a conservative lower bound on 
the analog bandwidth. 

We note that the peaking observed is also 
present in the case of gaussian noise, though the 
peak of the distribution is a function of the input 
biasing network. This is likely due to resonant L-C 
response in the input front end and seems coupled 
to the cross-talk observed below. 



3.3. Linearity 

A determination of the linearity of the digitizing 
system has been made by varying the RF signal 
amplitude as displayed in Fig. 9. 



Linearity scan 
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Taking the difference of these two curves, the 
analog response versus frequency is determined 
and shown in the bottom plot of Fig. 8. At the 
left edges of the curves the impact of the 200MHz 
high-pass band definition filters are seen. Of note 
is peaking of the signal in the 300-400MHz range, 
an effect seen in earlier testing. Taking the -3 dB 
point as the line shown, the roll-off frequency is 
just over 900 MHz, though signal power is still 
available out to 1200MHz. Four LAB3 are being 
tested in parallel and thus an ideal loss would be 
-6dB, indicating some amount of loss in the RF 
signal chain and coupling into the chip. Earlier 
tests on a dedicated, single LAB3 board, without 



Fig. 9. Linearity determined by attenuating an RF pulse 
as described in the text. 



Since power attenuators are used, the response 
is characterized in dB and a linear fit is observed on 
a logarithmic plot. Good linearity is seen with just 
a hint of saturation at large signal amplitudes and 
some non-linearity at small signal amplitude due 
to the coaddition of board-level noise. Any non- 
linearity observed is likely due to non-linearities 
in the ramp generation circuit or comparator bias 
setting. Over a span of 40dB in dynamic range, 
the LAB3 output tracks input to within statistical 
measurement errors. 
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3.4. Crosstalk 

By inserting the signal successively into the 
LAB3 channels, a cross-talk correlation plot was 
constructed as shown in Fig. 10. 
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Fig. 10. Measured crosstalk for each channel as a function 
of channel into which signal is injected. For signal in self- 
-channel, the amplitude is unity. Note that these values are 
overestimated, as described in the text. 

These values shown are determined by searching 
for a peak around the time of the input signal. Due 
to noise, statistically a few percent peak is mea- 
sured even for the case of no cross-talk. Therefore 
the values shown are overestimated. For RF appli- 
cations, even a 10% voltage crosstalk is only 1% in 
power. 

Nevertheless, for other applications it is impor- 
tant to understand the source of this effect. A hint 
to the origin of this crosstalk may be seen in Fig. 1 1 . 

Similar temporal and frequency dependence to 
the cross-talk can be reproduced in SPICE simu- 
lations, though the solutions are not unique. That 
is, the amplitude and phase information can be 
mimicked by tuning the voltage source output in- 
ductance of the pedestal network or with respect 
to bond-wire inductance stray coupling. Based on 
these results, a channel-dependent phase-lag to the 
cross-talk was predicted and subsequently verified 
qualitatively, as shown in Fig. 12. 
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Fig. 11. Schematic representation of the input bias circuit. 
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Fig. 12. Phase lag of the measured crosstalk. For channels 
separated by the timing control section of the chip both 
the amplitude and phase are less well constrained. 



In addition, a small component of direct radia- 
tive coupling between the on-chip striplines cannot 
be ruled out, though was difficult to model (met- 
alic heat sinks would need to be taken properly 
into account in the 3D EM simulations) . All results 
indicate that better packaging (lower inductance) 
and striplinc shielding would help improve the ob- 
served effects. 
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4. Required Calibrations and Stability 



4.2. Timing Calibrations 



In order to obtain the test results shown, a num- 
ber of calibrations are needed. In the process of 
applying these, much improved resolution is ob- 
tained. Temperature dependence and timing pre- 
cision limits are considered. 



4.1. Gc 



id Pedestal Calibration 



For the measurements shown, the gain has 
been adjusted to approximately lmV/least count. 
A comprehensive pedestal histogram of all SCA 
storage channels (excluding channel 9) on the 36 
LAB3 flown on ANITA is summarized in Fig. 13. 
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Fig. 13. A summary of the pedestal values (in mV) for all 
SCAs of 36 production LAB3 tested. 

Channel 9 is excluded since it has a different volt- 
age offset value due to the clock input biasing. The 
spread seen is a combination of 36 pedestal voltage 
differences, SCA-SCA variations, and Wilkinson 
ramp slope and starting voltage offsets. Also the 
gain of one LAB3 (values clustered around 1100) 
had an anamolously low gain. Overall the RMS of 
this distibution is just over 4%. 



In order to obtain the best possible timing reso- 
lution, a number of calibrations, due to the method 
in which the sampling is implemented, must be 
considered. As mentioned earlier, the write pointer 
is monitored using a copy of the signal called RCO. 
Since the sampling is done in so-called Common 
Stop mode, it is continuous until a trigger condi- 
tion is formed. Thus all samples have already been 
recorded by the time a trigger is acted upon. In or- 
der for sampling to be continuous it is necessary for 
the write pointer to wrap around from the end of 
the array to the beginning, as illustrated in Fig. 14. 




Fig. 14. Write pointer wrap around. While the write pointer 
returns to position of the array, additional tail samples 
are taken in order avoid a gap in the sampling record. 

Four additional "tail" samples are provided to 
permit samples to be recorded during the time in 
which the write pointer is returning to the begin- 
ning of the array. Even though the physical dis- 
tance is only 20-30ps at the speed of light, the 
need to go through an additional inverting stage 
(to form ring oscillator) and the capacitance asso- 
ciated with the long signal line back to the begin- 
ning of the array limit the speed of write pointer 
return. 

Also mentioned earlier, the write pointer speed 
of propagation across the array is a function of 
the transition direction. Likewise the delay time of 
write pointer return is also RCO phase dependent. 
The most general case of these calibration con- 
stants is illustrated in Fig. 15. From the measured 
RCO frequency (/rco), the sampling frequency is 
determined as 



/sampling — 2 X 256 X /rCO- 



(13) 
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Fig. 15. Definition of the most general LAB3 sample timing relationships and constants. Determination of their values is 
described in the text. 



Expressing /rco m terms of its period T, half 
period T corresponds to RCO phase and half 
period T% corresponds to RCO phase 1, or 



/sampling — 512 X (Tq + T\) 



(14) 



in which case the time step of an individual sam- 
ple is expressed as 



At = 



T 

512' 



(15) 



In general, as mentioned, the half periods To and 
T\ are not half the period T: 



T ± T x £ 



T 



(16) 



which means that the average individual time 
steps in phase (Aio) are different from those in 
phase 1 (Aii). Likewise the delay time of the write 
point propagation for RCO — > 1 (ei) and for RCO 
1 — * are in general different and related to the 
difference between average Ato and Aij.. Finally, 
due to transistor threshold dispersion, the actual 
widths of each of the time bins (At^i 59 ) can be 
slightly different. 

Using a known periodic input signal, it is pos- 
sible to generate calibration values for all of these 
parameters. An example of determination of the 
relative average At and Ati is shown in Fig. 16. 
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Fig. 16. Measurement of the write pointer propagation 
(sampling speed) difference for the RCO = (top) and 
RCO = 1 (bottom) phases for a 200MHz reference clock. 

In each case the variable parameter is tuned un- 
til the spread or offset in the determined period is 
minimized. Because the period is well determined, 
the procedure is very efficient and requires a rela- 
tively small amount of calibration data. 
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Similarly, the write pointer wrap around delays, 
eo and ei, may be determined by constraining the 
measured period to be consistent across the write 
pointer wrap around. An example is shown in 
Fig. 17. 

Wrap Offset - Phase to 1 



37.350/ 16 
17B.353±6.277 
1.416 ±0.004 
0.131 ± 0.003 



Wrap Offset - Phase 1 to 




Fig. 17. Extraction of the wrap timing offsets (eo and ei) 
for a given LAB3. 

To a certain extent these calibration steps must 
be bootstrapped. For example, correctly minimiz- 
ing the error on these e parameters requires that 
the average time steps in each of the RCO phases 
be correctly determined. A subtlety here is that the 
RCO phase is recorded at the time a trigger signal 
(hold) is issued. Because the RCO latching in the 
data is not completely synchronous, there is in gen- 
eral a delay between the measured value of RCO 
and its actual value. This ambiguity is resolved 
by assigning a phase delay between the measured 
RCO that depends upon the address at which the 
hold was issued, the so-called "HitBus" value. The 
value of this delay is tuned in the data until the 
width of the measured period is again minimized. 

Finally, using a high frequency clock it is possi- 
ble to constrain the average half period and assign 
its average value to the Atg-j 259 bin in which the 
positive/negative lobe peaks. Using this prescrip- 
tion the distribution histogrammed in Fig. 18 is 
obtained. 




250 300 350 400 450 500 
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Fig. 18. Summary distribution of the calibrated individual 
time bin widths for all SCAs in 36 LAB3 ASICs. 

4.3. Time Resolution Limitations 



Applying these timing corrections to the data 
leads to an improvement in the time resolution of 
signals in the data. The precise improvement de- 
pends upon the signal distance within the window 
(cumulative error) and method for correlating sig- 
nal shapes to extract a timing feature for compar- 
ison. 

To understand the intrisic performance limits 
and the significance of the bin-by-bin correction, 
a simple Monte Carlo study was performed to 
determine the extent to which the technique used 
to extract the observed timings would lead to the 
observed distribution. Introducing a completely 
random scatter (uniform distribution) of 15% to 
the nominal 386ps bin width, 600MHz sine MC 
was then synthesized and the algorithm applied. 
A value of 15% was determined empirically to 
provide a good representation of the observations 
in data. Due to irreducible errors in the specific 
implementation of this zero-crossing technique, 
application of these constants improves the tim- 
ing resolution to about 28ps, as shown in Fig. 19. 
This has improved the resolution by about 20ps 
in quadrature, though perhaps there is still room 
for improvement. Since two edges are used to 
determine this time interval, the single edge mea- 
surement is about 28ps/-\/2 or about 20ps and 
probably is a limit with the current LAB3. 
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Bin Width Est, (non-ideal bins) 
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Fig. 19. Monte carlo. 

These determined parameters appear to be sta- 
ble in time and only depend upon thermal effects, 
and described next. 



4.4. Temperature Dependence 

The voltage controlled oscillator for the write 
pointer is fundamentally temperature dependent. 
During operation of LABI an external delay lock- 
ing loop circuit was used to adjust ROVDD to com- 
pensate. However this circuit suffered from large 
phase noise as well as a nasty habit of locking onto 
a frequency subharmonic at power-on. Therefore, 
with the addition of a dedicated timing channel - 
needed to precisely align multiple LAB3 waveforms 
offline - ROVDD was fixed and timebase correc- 
tion is implemented by fitting to the period of the 
reference clock. 

The temperature dependence of the sampled fre- 
quency is shown in Fig. 20. Good agreement is 
seen with SPICE simulations of the temperature 
dependence, once an operating reference point is 
set. This fine tuning is needed to correct for the 
overly pessimistic parasitic capacitance estimate 
used earlier in simulating the ripple oscillator fre- 
quency. Using the reference clock signal on chan- 
nel 9, this temperature dependence of the VCO is 
corrected in the offline analysis. A fit to this de- 
pendence gives a change of approximately 55ps/°C 
over the 30ns period of the 33MHz reference clock, 
or about 0.2%/°C. 



Fig. 20. Measured and SPICE simulated temperature de- 
pendence of the LAB3 sampling period. 

In contrast, the pedestals are a very weak func- 
tion of temperature. In Fig. 21 is displayed the dif- 
ference in pedestal values taken after an ambient 
temperature change of approximately 17°C. 

Pedestal Stability 




50 
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Fig. 21. Difference in pedestal between dedicated pedestal 
runs taken 30 hours apart, at a difference in ambient tem- 
perature of 17 degrees Celcius. 

Taking this difference, an estimate of the 
pedestal temperature dependence is 

ADCcounts 



PED, 



-0.052 ■ 



(17) 
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For reference, and to illustrate the typical chip- 
level noise, an example noise run is shown in 
Fig. 22. Representative noise values are about 
1.3m V rms , though there is some non-gaussian be- 
havior in the combined distribution of 2.2 million 
samples from 9 separate RF channels. 
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Fig. 22. Sample LAB3 lk event noise run, with all 9 chan- 
nels combined into a single distribution. 



4.5. Interleaved Sampling 

Right at Nyquist sampling of UHF RF sine wave 
signals visually appears undersampled to most ob- 
servers. This is due to expectations from seeing 
smooth curves generated by 20+ GHz offset inter- 
leaved sampling of a repetitive waveform, provided 
by most digital signal oscilloscopes. By providing 
precise external delays it is possible to enhance 
the sampling speed and provide oversampling with 
the LAB3 chip. Interleaving of 8 inputs, running 
at 2.5GSa/s each has been done to provide single- 
shot recording of a 400MHz sine wave signal at 
20GSa/s, as shown in Fig. 23. Each color repre- 
sents the samples recorded by a single channel. 

While there is some scatter due to the delays not 
being perfectly tuned, this indicates that there is 
still more performance to be gained by increasing 
the analog bandwidth yet further and implement- 
ing such interleaving. For low power and very high 
sampling rate applications, where signals may not 
be repetitive, this technique may be useful. This 
and other improvements are considered next. 



E 

<200 



'•f 



6 7 



8 9 10 
Time [ns] 



Fig. 23. Example of 20GSa/s interleaved, single-shot wave- 
form recording of a 400MHz sine wave signal on 8 LAB3 
input channels, each plotted with a different color. 

5. Future Directions 



Beyond increasing analog bandwidth, to fully 
exploit the enhanced sampling speed of deep sub- 
micron processes, there is a desire to increase sam- 
pling depth. This is being explored in a follow- 
on device designated the Buffered LABRADOR 
(BLAB) ASIC. While the sampling speed increases 
below 0.25/zm, loss of dynamic range due to re- 
duced rail voltages and increased leakage current 
may preclude going to smaller feature sizes. 



6. Applications 

During December 2006 to January 2007, 36 
LAB3 ASICs flew successfully at 120,000 feet for 
35 days around the Antarctic continent. Some of 
the test results shown above are from this data set. 
During this same period, test deployments for an 
in-ice radio detector, using this device, were made 
at the south pole in conjunction with the IceCube 
array [16]. Recently, these ASICs were evaluated 
in a collidering beam environment for upgrade 
of the Belle Time-Of-Flight readout [17], and a 
variant for operation at a Super B-factory [18] is 
being developed for high timing precision, single 
photon recording [19]. For these future applica- 
tions, a deeper sampling depth is highly desirable 
and such a device is currently being prototyped. 



14 



7. Summary 

A Switched Capacitor Array (SCA) device has 
been developed in a 0.25/xm CMOS process with 
a 3dB analog bandwidth of almost a Giga-Hertz, 
capable of being sampled at many GSa/s, or well 
above Nyquist minimum. Sampling is performed 
at low power and the entire array of 9 channels by 
260 samples can be digitized to 12-bits of resolu- 
tion and read out within 50/xs. With calibration 
excellent time and sample voltage resolution have 
been obtained over a large range of temperature 
and sampling speeds. 
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